week 9: advanced methods

measurement

library(tidyverse)
library(psych)
library(cowplot)
library(patchwork)
library(here)
library(brms) 
library(tidybayes)
library(ggdag)

measurement error

Measurement is the principled assignment of numbers to qualitities. And no matter what you measure or how you measure it, you’ll have some error.

Some tools tend to produce very little error (e.g., the length of this table in in inches). Other tools tend to produce more error. In the social sciences, our job is made harder by the fact that we often measure qualities that do not have an objective physical reality, like happiness or well-being. Measurement error is exacerbated when there is little available data.

Unfortunately, our statistic models will assume your measure has no error… unless you can tell the model how much error there is.

In measurement theory, we may assume that

\[ X_i = T_i + \epsilon_i \]

Meaning that for any observation \(i\), the observed score \(X\) on some measure is the sum of the true score \(T\) and error \(\epsilon\). Classical test theory assumes that \(\epsilon_i\) is randomly distributed, but other theories (IRT) disagree. Regardless, we can move forward with the assumption that observed scores are caused by some true score and some error.

marriage example

data(WaffleDivorce, package="rethinking")
d <- WaffleDivorce

rethinking::precis(d) 
                          mean           sd     5.5%        94.5%
Location                   NaN           NA       NA           NA
Loc                        NaN           NA       NA           NA
Population        6.119600e+00 6.876156e+00  0.65780 1.897690e+01
MedianAgeMarriage 2.605400e+01 1.243630e+00 24.26950 2.826100e+01
Marriage          2.011400e+01 3.797905e+00 15.20850 2.649150e+01
Marriage.SE       1.399400e+00 7.969749e-01  0.54950 2.902200e+00
Divorce           9.688000e+00 1.820814e+00  6.66950 1.273050e+01
Divorce.SE        9.618000e-01 5.253675e-01  0.34085 1.893050e+00
WaffleHouses      3.234000e+01 6.578959e+01  0.00000 1.357450e+02
South             2.800000e-01 4.535574e-01  0.00000 1.000000e+00
Slaves1860        7.937834e+04 1.497309e+05  0.00000 4.355531e+05
Population1860    6.287293e+05 7.813127e+05  0.00000 1.903357e+06
PropSlaves1860    9.405132e-02 1.744486e-01  0.00000 4.561000e-01
                       histogram
Location                        
Loc                             
Population              ▇▃▁▁▁▁▁▁
MedianAgeMarriage ▁▁▂▂▃▇▅▃▁▁▂▁▁▁
Marriage              ▁▃▇▇▇▅▂▁▁▁
Marriage.SE             ▁▇▅▃▁▂▁▁
Divorce                 ▂▃▅▅▇▂▃▁
Divorce.SE          ▂▇▇▃▃▂▁▂▂▁▁▁
WaffleHouses            ▇▁▁▁▁▁▁▁
South                 ▇▁▁▁▁▁▁▁▁▂
Slaves1860            ▇▁▁▁▁▁▁▁▁▁
Population1860          ▇▃▂▁▁▁▁▁
PropSlaves1860      ▇▁▁▁▁▁▁▁▁▁▁▁
Code
dag_coords <-
  tibble(name = c("A", "M", "D", "Dobs", "eD"),
         x    = c(1, 2, 2, 3, 4),
         y    = c(2, 3, 1, 1, 1))

dagify(M    ~ A,
       D    ~ A + M,
       Dobs ~ D + eD,
       coords = dag_coords) %>%
  tidy_dagitty() %>% 
  mutate(color = ifelse(name %in% c("D", "eD"), "a", "b")) %>% 
  
  ggplot(aes(x = x, y = y, xend = xend, yend = yend)) +
  geom_dag_point(aes(color = color),
                 size = 10, show.legend = F) +
  geom_dag_text(parse = T, label = c("A", "D", expression(D[obs]), 
  "M",expression(italic(e)[D]))) +
  geom_dag_edges() +
  theme_void()
Code
p1 <- d %>%
  ggplot(aes(x = MedianAgeMarriage, 
             y = Divorce,
             ymin = Divorce - Divorce.SE, 
             ymax = Divorce + Divorce.SE)) +
  geom_pointrange(shape = 20, alpha = 2/3, color="#1c5253") +
  labs(x = "Median age marriage" , 
       y = "Divorce rate")

p2 <-
  d %>%
  ggplot(aes(x = log(Population), 
             y = Divorce,
             ymin = Divorce - Divorce.SE, 
             ymax = Divorce + Divorce.SE)) +
  geom_pointrange(shape = 20, alpha = 2/3, color="#e07a5f") +
  scale_y_continuous(NULL, breaks = NULL) +
  xlab("log population")

p1 | p2

meta analysis

item-response theory